Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3598][CH] Support to config the hash algorithm for the ch shuffle hash partitioner #3604

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

zzcclp
Copy link
Contributor

@zzcclp zzcclp commented Nov 2, 2023

What changes were proposed in this pull request?

Now the hash algorithm of the ch shuffle hash partitioner is cityHash64, which is different from vanilla spark, when there is one side shuffle of the join fallbacking, the hash id are different between the ch and vanilla spark, so add a configuration to control the hash algorithm for the ch shuffle hash partitioner.

Close #3598.

(Fixes: #3598)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…ffle hash partitioner

Now the hash algorithm of the ch shuffle hash partitioner is cityHash64, which is different from vanilla spark, when there is one side shuffle of the join fallbacking, the hash id are different between the ch and vanilla spark, so add a configuration to control the hash algorithm for the ch shuffle hash partitioner.

Close apache#3598.
Copy link

github-actions bot commented Nov 2, 2023

#3598

Copy link

github-actions bot commented Nov 2, 2023

Run Gluten Clickhouse CI

@lgbo-ustc
Copy link
Contributor

LGTM

Copy link
Contributor

@liuneng1994 liuneng1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 07ba657 into apache:main Nov 3, 2023
16 checks passed
Copy link
Contributor

@baibaichen baibaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

murmurHash3_32 148025
cityHash64 152058

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3604_time.csv log/native_master_11_02_2023_78104be3e_time.csv difference percentage
q1 35.24 34.98 -0.264 99.25%
q2 24.57 24.86 0.287 101.17%
q3 40.02 38.38 -1.641 95.90%
q4 35.17 37.54 2.368 106.73%
q5 71.58 71.16 -0.426 99.40%
q6 8.46 7.29 -1.172 86.15%
q7 87.93 88.90 0.978 101.11%
q8 86.73 87.43 0.701 100.81%
q9 117.18 120.87 3.687 103.15%
q10 52.82 51.83 -0.994 98.12%
q11 20.51 19.61 -0.897 95.63%
q12 25.31 26.28 0.975 103.85%
q13 48.85 48.19 -0.662 98.64%
q14 19.51 18.42 -1.091 94.41%
q15 34.77 32.90 -1.875 94.61%
q16 16.24 15.99 -0.254 98.43%
q17 101.80 101.55 -0.245 99.76%
q18 146.94 147.25 0.301 100.21%
q19 16.90 17.00 0.093 100.55%
q20 31.22 31.82 0.609 101.95%
q21 226.75 226.44 -0.314 99.86%
q22 13.40 13.39 -0.003 99.98%
total 1261.91 1262.07 0.159 100.01%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support to config the hash algorithm for the ch shuffle hash partitioner
5 participants